Part of Speech Tagging Bilingual Speech Transcripts with Intrasentential Model Switching

نویسندگان

  • Paul Rodrigues
  • Sandra Kübler
چکیده

This paper investigates incremental part of speech tagging for speech transcripts that contain multilingual intrasentential code-mixing, and compares the accuracy of a monolithic tagging model trained on a heterogeneous-language dataset to a model that switches between two homogeneous-language tagging models dynamically using word-by-word language identification. We find that the dynamic model, even though presented a smaller context consisting of sentence fragments, meets the accuracy of the monolithic code-mixing model which is aware of increased context. Our system is modular, and is designed to be expanded to many-language code-mixing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Part-of-Speech Tagging for English-Spanish Code-Switched Text

Code-switching is an interesting linguistic phenomenon commonly observed in highly bilingual communities. It consists of mixing languages in the same conversational event. This paper presents results on Part-of-Speech tagging Spanish-English code-switched discourse. We explore different approaches to exploit existing resources for both languages that range from simple heuristics, to language id...

متن کامل

Part of Speech Tagging for Code Switched Data

We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS). CS is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential CS, respectively. Processing CS data is especially challenging in intrasentential data given state of the art monoli...

متن کامل

Lexicon induction and part-of-speech tagging of non-resourced languages without any bilingual resources

We introduce a generic approach for transferring part-of-speech annotations from a resourced language to a non-resourced but etymologically close language. We first infer a bilingual lexicon between the two languages with methods based on character similarity, frequency similarity and context similarity. We then assign partof-speech tags to these bilingual lexicon entries and annotate the remai...

متن کامل

Unsupervised Multilingual Learning for POS Tagging

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multiple languages, the structure of each becomes more apparent. We formulate a hierarchical Bayesian model for jointly predicting bilingual streams of part-of-speech tags. The model learns language-specific features while ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013